Add Cortex-M as a first-class target in aot_arm_compiler by psiddh · Pull Request #17075 · pytorch/executorch

psiddh · 2026-01-30T22:46:12Z

Add a dedicated to_edge_cortex_m() path selected via --target=cortex-m that
owns the full pipeline: CortexMQuantizer for INT8 quantization, correct
EdgeCompileConfig with preserve_ops to prevent premature decomposition, and
CortexMPassManager.pass_list for op conversion. Remove the old scattered
transform_for_cortex_m_backend() function.

Verified all ops fully lowered to cortex_m::quantized_* operators for both
MobileNetV2 (70 nodes) and MobileNetV3 (122 nodes). E2E inference tested
on Alif E8 board.

Test Plan:
- python3 -m examples.arm.aot_arm_compiler -m mv2 --target=cortex-m --quantize --intermediates=./mv2_intermediates --output=./mv2_cortex_m.pte
- python3 -m examples.arm.aot_arm_compiler -m mv3 --target=cortex-m --quantize --intermediates=./mv3_intermediates --output=./mv3_cortex_m.pte

Also ran E2E inference on Alif E8 board

pytorch-bot · 2026-01-30T22:46:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17075

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 New Failures, 3 Cancelled Jobs, 2 Unrelated Failures

As of commit e6fd05b with merge base f06a1f6 ():

NEW FAILURES - The following jobs have failed:

pull / test-llama-runner-qnn-linux (fp32, qnn_8a8w, qnn) / linux-job (gh)
RuntimeError: Command docker exec -t f9853db376f7a257d61fffb4e2595abec2382ddcae5bee3088905a9e8ded2366 /exec failed with exit code 1
pull / test-openvino-linux / linux-job (gh)
RuntimeError: Command docker exec -t 0e820e0c6f66f6eef24d6bcb842664f2f39c1700d38fa0a68be6c27686633adb /exec failed with exit code 1
pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t 3451afeb3711b62e2614ff4ae7d9833fd2fdca0ddfbf9b5bf6f37670b5b90812 /exec failed with exit code 1
pull / test-samsung-quantmodels-linux / linux-job (gh)
RuntimeError: Command docker exec -t 2c36f027c8471f908f7f31acaeedb37badb6db2b2a7fc917923aa0b8c0c50b92 /exec failed with exit code 1
pull / unittest-nxp-neutron / linux-job (gh)
RuntimeError: Command docker exec -t 1b247818cf05d7a323e737f21f8b14a9b4b1fa7d392ec0ce32e4149426d19dbb /exec failed with exit code 1
trunk / test-arm-backend-ethos-u (test_memory_allocation) / linux-job (gh)
RuntimeError: Command docker exec -t a9bf7c002cd962365ef6e508016d104a894e0282907869fcb3d5c1788d7fbfec /exec failed with exit code 1
trunk / test-coreml-delegate / macos-job (gh)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 128

CANCELLED JOBS - The following jobs were cancelled. Please retry:

trunk / test-arm-backend-zephyr (cortex-m55) / linux-job (gh)
##[error]The operation was canceled.
trunk / test-arm-backend-zephyr (ethos-u55) / linux-job (gh)
##[error]The operation was canceled.
trunk / test-arm-backend-zephyr (ethos-u85) / linux-job (gh)
##[error]The operation was canceled.

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / test-models-linux (mv2, portable, linux.2xlarge) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
pull / test-vulkan-operators-linux / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-01-30T22:51:52Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

psiddh · 2026-02-06T07:49:21Z

examples/arm/aot_arm_compiler.py

        )

-    # Cortex-m ops are never included in vgf or direct-drive
-    if args.target != "vgf" and not args.direct_drive:


Should TOSA targets even have CortexM fallback ? ( --target=u55/u85 → TOSA delegation)

Copilot

Pull request overview

This PR enables full MobileNetV2 lowering to the CMSIS-NN backend for Cortex-M microcontrollers by implementing comprehensive support for quantized operations through a dedicated compilation path. The changes replace the previous delegation-based approach with a portable kernel-based architecture that converts all quantized operations to cortex_m::* operators.

Changes:

Added dedicated Cortex-M compilation path (to_edge_cortex_m) in the AOT compiler with CortexMQuantizer-based quantization
Implemented addmm operator support for decomposed linear layers through new _get_addmm_replacement method
Enhanced quantization parameter propagation with new PropagateQParamsPass and passthrough op handling in FoldAndAnnotateQParamsPass
Extended quantizer to mark parameter nodes as annotated and added passthrough ops (hardtanh, max_pool2d, dropout)

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
examples/arm/aot_arm_compiler.py	Adds `to_edge_cortex_m` function for Cortex-M compilation path using CortexMQuantizer and removes old `transform_for_cortex_m_backend` function
backends/cortex_m/quantizer/quantizer.py	Adds `_mark_param_node_as_annotated` method and extends passthrough ops list for MobileNetV2 support
backends/cortex_m/passes/propagate_qparams_pass.py	New pass to propagate qparams through passthrough ops (transpose/permute) to consumer nodes like addmm
backends/cortex_m/passes/cortex_m_pass_manager.py	Adds `PropagateQParamsPass` and `DecomposeAdaptiveAvgPool2dPass` to pass list, adds `skip_passes` parameter to `__init__`
backends/cortex_m/passes/convert_to_cortex_m_pass.py	Implements `_get_addmm_replacement` method to convert decomposed linear (addmm) operations to cortex_m.quantized_linear
backends/arm/_passes/fold_qdq_with_annotated_qparams_pass.py	Adds passthrough ops (hardtanh, relu, clamp) support and second-pass qparams propagation logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

backends/cortex_m/passes/convert_to_cortex_m_pass.py

examples/arm/aot_arm_compiler.py

backends/cortex_m/passes/convert_to_cortex_m_pass.py

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

backends/cortex_m/passes/convert_to_cortex_m_pass.py

backends/cortex_m/quantizer/quantizer.py

examples/arm/aot_arm_compiler.py

Copilot · 2026-02-06T09:15:36Z

backends/cortex_m/quantizer/quantizer.py

+    def _mark_param_node_as_annotated(self, node: Node) -> None:
+        """
+        Mark a weight/bias parameter node as annotated.
+
+        This is necessary for FoldAndAnnotateQParamsPass to recognize the node
+        as part of a quantized computation path. The ARM quantizer does this
+        via mark_annotated=True in _QuantProperty.
+        """
+        if Q_ANNOTATION_KEY not in node.meta:
+            node.meta[Q_ANNOTATION_KEY] = QuantizationAnnotation()
+        node.meta[Q_ANNOTATION_KEY]._annotated = True
+        annotation_info = ArmAnnotationInfo(quantized=True)
+        meta_custom = node.meta.get("custom", {})
+        meta_custom[ArmAnnotationInfo.CUSTOM_META_KEY] = dict(annotation_info)
+        node.meta["custom"] = meta_custom


The implementation of _mark_param_node_as_annotated duplicates the exact logic from mark_node_as_annotated in backends/arm/quantizer/arm_quantizer_utils.py. Consider importing and reusing the existing function instead of duplicating the code to improve maintainability and reduce the risk of divergence.

examples/arm/aot_arm_compiler.py

AdrianLundell

Hi, this PR needs major changes I'm afraid.

The changes to fold_qdq_with_annotated_qparams_pass and propagate_qparams_pass are very likely not needed, rather they are masking a faulty implementation either of the add_mm or the integration in the aot_arm_compiler.
The addition of the add_mm is a significant change which should be made in a separate PR properly tested with unittests as is done with all other ops.
It would be great to add mv2 also as a pytest similar to mv3, in fact I would suggesting starting to get that working before adding support to the aot_arm_compiler since the compilation pipeline is guaranteed to be working there.

psiddh · 2026-02-06T17:47:02Z

Hi, this PR needs major changes I'm afraid.

The changes to fold_qdq_with_annotated_qparams_pass and propagate_qparams_pass are very likely not needed, rather they are masking a faulty implementation either of the add_mm or the integration in the aot_arm_compiler.

The addition of the add_mm is a significant change which should be made in a separate PR properly tested with unittests as is done with all other ops.

It would be great to add mv2 also as a pytest similar to mv3, in fact I would suggesting starting to get that working before adding support to the aot_arm_compiler since the compilation pipeline is guaranteed to be working there.

Sure - I agree with the approach. I just wanted to share the work I've been up to recently so
that we can have exactly this kind of discussion.

Context on the design choice:

The Cortex-M backend keeps addmm directly (vs ARM's decomposition to Conv2D) to leverage CMSIS-NN's optimized linear
kernels. This creates a qparam propagation challenge:

When PyTorch decomposes nn.Linear to edge dialect:
linear(input, weight, bias) → addmm(bias, input, weight.T)

The weight flows through a transpose before reaching addmm:
weight → permute_copy → addmm

FoldAndAnnotateQParamsPass folds the DQ into permute, but output_qparams remains empty (no Q node after permute).
The addmm node expects weight qparams at input_qparams[2], hence PropagateQParamsPass bridges this gap.

Proposed approach:

I'll first add test_addmm.py and test_mobilenet_v2.py unit tests following the existing patterns
Once those pass and validate the pipeline, we can review whether PropagateQParamsPass is the right solution or if
there's a cleaner approach
The aot_arm_compiler integration can follow in a subsequent PR

This way we have proper test coverage before discussing the implementation details. Let me get the unit tests
working first.

AdrianLundell · 2026-02-09T09:34:05Z

Sounds good!

When PyTorch decomposes nn.Linear to edge dialect:
linear(input, weight, bias) → addmm(bias, input, weight.T)

The weight flows through a transpose before reaching addmm:
weight → permute_copy → addmm

I think the issue here is that you are not using the EdgeCompileConfig used in the tester:

 config = EdgeCompileConfig(
            preserve_ops=[
                torch.ops.aten.linear.default,
                torch.ops.aten.hardsigmoid.default,
                torch.ops.aten.hardsigmoid_.default,
                torch.ops.aten.hardswish.default,
                torch.ops.aten.hardswish_.default,
            ],
            _check_ir_validity=False,
            _core_aten_ops_exception_list=[torch.ops.aten.max_pool2d.default],
        )

When linear is not decomposed you avoid the issues around q/dq folding. In general the design philosophy is that we want to make the decompositions and annotations to get correct q/dq values directly rather than handling special cases in the folding, as that gets complex very quickly from our previous experience in the arm backend.

Previously, Cortex-M op conversion was applied as an afterthought to all non-vgf targets via transform_for_cortex_m_backend(). This made the flow hard to follow, used a bare EdgeCompileConfig that decomposed ops like linear into addmm (requiring unnecessary workarounds), and didn't use the CortexMQuantizer or CortexMPassManager. Add a dedicated to_edge_cortex_m() path selected via --target=cortex-m that owns the full pipeline: CortexMQuantizer for INT8 quantization, correct EdgeCompileConfig with preserve_ops to prevent premature decomposition, and CortexMPassManager.pass_list for op conversion. Remove the old scattered transform_for_cortex_m_backend() function. Verified all ops fully lowered to cortex_m::quantized_* operators for both MobileNetV2 (70 nodes) and MobileNetV3 (122 nodes). E2E inference tested on Alif E8 board. Test Plan: python3 -m examples.arm.aot_arm_compiler -m mv2 --target=cortex-m --quantize --intermediates=./mv2_intermediates --output=./mv2_cortex_m.pte python3 -m examples.arm.aot_arm_compiler -m mv3 --target=cortex-m --quantize --intermediates=./mv3_intermediates --output=./mv3_cortex_m.pte Also ran E2E inference on Alif E8 board

Copilot

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-19T17:00:45Z

examples/arm/aot_arm_compiler.py

+    pass_instances = []
+    for pass_cls in CortexMPassManager.pass_list:
+        sig = inspect.signature(pass_cls.__init__)
+        if "exported_program" in sig.parameters:
+            pass_instances.append(pass_cls(edge.exported_program()))
+        else:
+            pass_instances.append(pass_cls())
+    edge = edge.transform(pass_instances)


Manual pass instantiation duplicates logic from CortexMPassManager.transform(). The code here manually inspects each pass class and instantiates it based on whether it accepts an exported_program parameter, which duplicates the exact same logic already present in CortexMPassManager.transform(). Consider simplifying this by directly using the CortexMPassManager instead of manually instantiating passes. For example: pass_manager = CortexMPassManager(edge.exported_program()); edge_ep = pass_manager.transform(); edge = EdgeProgramManager({\"forward\": edge_ep}, ...)

Already tried the CortexMPassManager approach, and it broke things — CortexMPassManager.transform() returns an ExportedProgram, not an EdgeProgramManager. The edge object was left untransformed, resulting in 351 raw aten ops.

The cleanest approach: add an instantiate_passes method to CortexMPassManager that extracts the inspect logic into a reusable method. Then both transform() and to_edge_cortex_m() can use it without duplication, which can be a follow up PR

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 30, 2026

psiddh force-pushed the main branch 6 times, most recently from 39666cd to 7f14a9d Compare February 4, 2026 09:06

zingo added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk module: microcontrollers For embedded MCUs like Cortex-M, or RTOS like Zephyr, does not track NPU backend like Arm Ethos. labels Feb 5, 2026

psiddh force-pushed the main branch 5 times, most recently from 1b64ef3 to 41462be Compare February 6, 2026 07:48

psiddh commented Feb 6, 2026

View reviewed changes

psiddh requested review from AdrianLundell and rascani February 6, 2026 07:54

psiddh changed the title ~~Summary:MV2 CortexM PassManager changes for Alif E8~~ Cortex-M: Enable full MobileNetV2 lowering to CMSIS-NN backend via Aot Compiler script Feb 6, 2026

psiddh marked this pull request as ready for review February 6, 2026 07:56

psiddh requested a review from digantdesai as a code owner February 6, 2026 07:56

Copilot AI review requested due to automatic review settings February 6, 2026 07:56

Copilot started reviewing on behalf of psiddh February 6, 2026 07:56 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

psiddh force-pushed the main branch 2 times, most recently from d7d85fb to b222911 Compare February 6, 2026 09:09

Copilot AI review requested due to automatic review settings February 6, 2026 09:09

Copilot started reviewing on behalf of psiddh February 6, 2026 09:09 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

AdrianLundell requested changes Feb 6, 2026

View reviewed changes

psiddh marked this pull request as draft February 6, 2026 17:47

psiddh mentioned this pull request Feb 18, 2026

Fix to_edge_transform_and_lower to respect preserve_ops without parti… #17493

Merged

psiddh force-pushed the main branch from b222911 to a7a30da Compare February 19, 2026 09:10

psiddh changed the title ~~Cortex-M: Enable full MobileNetV2 lowering to CMSIS-NN backend via Aot Compiler script~~ Add Cortex-M as a first-class target in aot_arm_compiler Feb 19, 2026

Merge branch 'main' into main

e6fd05b

psiddh marked this pull request as ready for review February 19, 2026 16:55

Copilot AI review requested due to automatic review settings February 19, 2026 16:55

Copilot started reviewing on behalf of psiddh February 19, 2026 16:55 View session

Copilot AI reviewed Feb 19, 2026

View reviewed changes

psiddh mentioned this pull request Feb 19, 2026

Alif E8 board --> Run MV2 #16628

Open

Conversation

psiddh commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17075

❌ 7 New Failures, 3 Cancelled Jobs, 2 Unrelated Failures

Uh oh!

github-actions bot commented Jan 30, 2026

This PR needs a release notes: label

Uh oh!

psiddh Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AdrianLundell left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

psiddh commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AdrianLundell commented Feb 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

psiddh Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

psiddh commented Jan 30, 2026 •

edited

Loading

pytorch-bot bot commented Jan 30, 2026 •

edited

Loading

This PR needs a `release notes:` label

AdrianLundell left a comment •

edited

Loading

psiddh commented Feb 6, 2026 •

edited

Loading